Universal Access To All Knowledge
Home Donate | Store | Blog | FAQ | Jobs | Volunteer Positions | Contact | Bios | Forums | Projects | Terms, Privacy, & Copyright
Search: Advanced Search
Anonymous User (login or join us)

Reply to this post | See parent post | Go Back
View Post [edit]

Poster: Administrator, Curator, or StaffArkiver Date: Jan 7, 2014 12:35pm
Forum: faqs Subject: Re: the captures of my site are so sparse

Hmm, I see... They really excluded your site from quite some crawlers... I see the folder "/cgi-bin/"? What did that folder contain?
Removing a robots.txt file won't exclude your website from search engines. It will only say to search engines that there are "no rules" when crawling your website.

I haven't started yet to crawl your whole website, I am still discovering. Running the program now for around 7 hours and discovered around 220.000 links. Tomorrow I will probably be finished and then you can see your full website in the wayback machine. I'll keep you informed about it!

Yes, as I said in my reply to your other post, it can be very bad for the number of customers if you have two websites that share the traffic between each other.
I will also start a crawl of the medexamtools.info website since you say you are going to something with it (not sure if you are going to delete it, or change it dramatically).

I'll keep you informed about my progress and you'll see the websites in the wayback machine... ;)

Reply to this post
Reply [edit]

Poster: Medworks Date: Jan 7, 2014 7:18pm
Forum: faqs Subject: Re: the captures of my site are so sparse

Rather than replying to your 3 replies since I last looked separately, I'll consolidate them into 1 to be concise and not so confusing. I don't know if you (Arkiver) are also Michael Ronayne but I'll do that one separately.

I don't know what was in cgi-bin. I don't seem to have one now. Not anywhere in it. When I try putting medexamtools.com/cgi-bin into any of the wayback machine archives I get a not archived message, so I don't know where you're seeing it. If there was a cgi-bin then it was probably generated by hackers, or it could have possibly been generated by inmotionhosting one of the times when I called them and got customer service and they helped me do something on the site. But I do have a _vti_bin directory if that's what you meant. It's just something that microsoft frontpage generates, which is involved in its operations. If I mess with that too much, the "search our site" function is liable to stop working.

I guess I should probably just get rid of the robots.txt file then. Unless you know of any crawlers that are bad and that I SHOULD exclude.

I see, 2 sites with mediocre results or 1 with twice as many. Basically one site that gets X hits a day or two sites that each get less than X/2 hits a day. In that case, I wonder what I might do with medexamtools.info. The odd thing is though, that the search engines already seem to treat the individual pages comprising medexamtools.com separately. In other words, if I search "dejerine reflex hammer" I won't get medexamtools.com in google, I get www.medexamtools.com/r6-page.htm. And if I search "troemner reflex hammer", I get 3 consecutive results, results 6, 7 and 8 on google, www.medexamtools.com/troemnernew.htm is 6, www.medexamtools.com/r8-page.htm is 7 and www.medexamtools.com/troemnerstreamlined.htm is 8. So it's more complicated than just one site getting webtraffic or 2 splitting it up, my 1 working website already has split results. I have also never seen medexamtools.info in search results. Obviously I understand your reasoning that it's better to get one website than 2 that do the exact same thing though. I wonder if I might put the electronics in medexamtools.info because it's a completely different category, after all old customers know about medexamtools.com. Or maybe just to have each one use different keywords. Though I'm not that good at it, and it would be a massive undertaking. Oh, my medexamtools.info is looking like such a doomed venture, I just wanted to test something to replace the frontpage site.

medexamtools.info putting // in a bad place between links between pages. I have a feeling that's not the biggest problem with poor medexamtools.info but it's one I'll try to keep in the back of my mind. I really need to try to solve it.

Yeah, being hit by that car ruined a lot. If you want to hear my rant about it, the woman ran a red light to hit me on a crosswalk. Then she had the nerve to tell the cop that I was a crazy jogger who just sprinted out of nowhere and took a swan dive into her windshield while her car was stopped and while she was busy looking to the left and right (actually she wrote left twice on her witness statement and crossed out the second one and wrote right). And that's what the cop said he thought when he interrogated me in my hospital room as I was coming out of a coma. He told me he had 5 witnesses who all agreed on that version of events and was giving me a citation for jaywalking and she got no consequences whatsoever. He was lying. The witnesses agreed with what I said. The police report followed that story, the newspaper and her insurance based off the police report, and I found out weeks later when they made the witness statements available (after the jaywalking ticket was due and after the court date where I might dispute it) that the witnesses all said the same thing I said (or that they didn't witness the actual impact), that she was looking only to the left, only concerned with cars coming from the left, and ignored everything to the right and directly in front of her, and hit the gas and got me. She gave me 100+k$ in medical bills and I had the pleasure of being coerced by a lawyer under threat of paying HIM money I didn't have to accept the 25k$ (all the insurance coverage she had) from allstate which came with the string attached that she was absolved from all further suits. She took the time before that to hide her assets, including her divorce settlement, and claimed to only have her fat annuity and fat social security check, both of which are more than I have, even if her lie was true, but she's sitting on all sorts of money, no penalties for perjury about either what happened or her assets (I can't legally do anything to verify if she was telling the truth - and what did she do, spend hundreds of thousands of dollars in 2 years since her divorce and then settle down for being almost as poor as me, I don't believe it), and is still driving around her death machine with 25k insurance coverage, ready to ruin the life of the next person 40 years her junior (play the indy game Turbo Granny, that's pretty much her), no moving citation, doesn't have to take a driving test or anything, and the cop is still probably interrogating semiconscious people in hospital rooms as they come out of comas from having their skulls fractured to drive into them the version of events that involves the least amount of paperwork for him. Also she took my sense of hearing away in my left ear and replaced it with the neverending sound of nails on a chalkboard at 100 decibels, and my IQ dropped from 140 to 110. Hurray for the wheels of justice. It just goes to show you, not EVERYTHING that happens in the US legal system is some trespasser getting bitten by a dog and suing the homeowner for a million dollars, or the old woman who burned herself on mcdonalds coffee and getting a million dollars out of them. But they'd both better hope I'm never in a position of power, I'll tell you that. If I choose to live that long (I can't imagine living like this for years and years and years though, just being away from a loud fan running is absolute agony so I'm a shut-in now), I'm certainly going to show up at her funeral with a westboro baptist church style sign to denigrate her and troll her family though.

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or StaffArkiver Date: Jan 7, 2014 10:42pm
Forum: faqs Subject: Re: the captures of my site are so sparse

Well, you can use the robots.txt to tell crawlers to not crawl your website, but it won't work if the crawlers are "bad". The google crawler, IA crawler and many other crawler stick to the rules of the robots.txt, but crawler can also just not follow the rules from the robots.txt file and do whatever they want. So you can exclude bad crawlers, but it wouldn't help a lot...

I don't think I can help you a lot now with the pages and the search results in google from your website. I think you should take a look at some Website SEO (Search Engine Optimalization) articles on how to make your site better searchable in google and other search engines.

It sounds horrible what happened to you with the car accident... There are some really horrible and disgusting people on this planet... :(

Reply to this post
Reply [edit]

Poster: Medworks Date: Jan 8, 2014 1:39pm
Forum: faqs Subject: Re: the captures of my site are so sparse

Well, yes, obviously it would be completely outragious for me to expect you to help me redesign my website(s), I'm astonished you did as much as you did.

Just a little FYI though, I just discovered that in fact there IS a reason to have a crawler delay in there. My website was SHUT DOWN and suspended by my webhost 3 hours ago because they got 100000 requests from a bot in the netherlands. If as you say the "bad bots" don't obey the directive to delay the time in the text file then this was completely coincidental that it happened right when I removed the delay from robots.txt, but more likely than not, it was a consequence of it, which means it was a "good bot", i.e. one I don't want to ban, but it just did it so fast that it angered the webhost. The inmotionhosting representative actually said the opposite of what you did, he actually said they generally recommended a delay of 30 seconds. But I got him to put in a delay of 1 second before putting the site live again and removing the suspension. So it's apparently bad to remove it entirely. I can only guess that the environment of the internet is different now than it was at the beginning of 2011 because as you noted, there was a whole 3 year period of time there was no robots.txt file at all, yet this thing where the single IP address slams the website with requests because there's no time delay in a robots.txt file never happened in all that time, yet it happened essentially as soon as I removed the delay line from robots.txt 2 days ago here in 2014, so you might consider that it's good to have a delay after all, just not 30 seconds.

Well thanks for all your help. The IP address in netherlands wasn't anything associated with alexa or the wayback machine, was it? You said you were doing somethingorother that would count in the hundreds of thousands with my site. The problem was just that it was too much, too fast.

Reply to this post
Reply [edit]

Poster: Administrator, Curator, or StaffArkiver Date: Jan 8, 2014 10:47pm
Forum: faqs Subject: Re: the captures of my site are so sparse

..... oh...
Gosh, I think I actually am that ip adress in the Netherlands... :/
Well, I didn't expect that to happen, I am very very very sorry!! :(

This post was modified by Arkiver on 2014-01-09 06:47:55

Terms of Use (31 Dec 2014)